class: center, middle, inverse, title-slide .title[ # 2023 Women in Statistics and Data Science Conference ] .subtitle[ ## Mortality Rates from Violent Deaths by Racial and Ethnic Groups in the United States, 2016-2020 ] .author[ ###
Ying-Ju Tessa Chen, PhD
(This is joint work with Dr. Tatjana Miljkovic.)
Associate Professor
Department of Mathematics
University of Dayton
@ying-ju
ying-ju
ychen4@udayton.edu
] .date[ ### October 27, 2023 ] --- ## Research Pathways on Violent Deaths in the U.S. .pull-left[ - .blue[CDC Reports (Wilson, Liu, Lyons, Petrosky, Harrison, Betz, and Blair (2022)):] * Surveillance of national violent deaths * Data insights: + 2019: 51,627 violent deaths + Composition: - Suicides: 64.1% - Homicides: 25.1% - Deaths of undetermined intent 8.7% - Legal intervention deaths: 1.4% - Unintentional firearm deaths: < 1.0% ] .pull-right[ .red[Identified gaps:] - No cross-sectional studies on causes & sociodemographic factors - No studies on mortality rates over time based on combinations (e.g., age & sex) - No statistical model building efforts on CDC data - .blue[Focus on Individual Manner of Violent Deaths] - .blue[Comparative Studies with Other Countries] ] --- ## Our Study Focus - Validate results (e.g., homicides by age and race). - Examine individual violent death manners combined with sociodemographic factors over time. - Study scope: U.S. only, not comparing with other countries. --- ## Objective .pull-left[ - Investigate mortality rates in underrepresented vs. white communities. - Examine if rates differ between these communities. - Assess significance of: + Sex + Race + Age + Manner of death ] .pull-right[ - Analyze relative risk for manners of death: + Suicide + Homicide + Other * Legal intervention * Unintentional firearm (self-inflicted) * Undetermined intent * Other unintentional firearm (another or unknown-inflicted) ] --- ## Datasets Used .pull-left[ - .blue[NVDRS] - Manners of Deaths - Year - State - Sex - Age - Race - Ethnicity .left[.footnote[.blue[National Violent Death Reporting System]] ]] .pull-right[ - .blue[CDC WONDER] - Year - State - Gender - Age Group - Race - Population .left[.footnote[.blue[Wide-ranging ONline Data for Epidemiologic Research]]] ] --- ## Terminology for Race and Ethnicity used Across Sources <table> <thead> <tr> <th style="text-align:left;"> Race/Ethnicity </th> <th style="text-align:left;"> This Report </th> <th style="text-align:left;"> NVDRS </th> <th style="text-align:left;"> WONDER </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> Asian/Asian American </td> <td style="text-align:left;"> Asian/Pacific Islander </td> <td style="text-align:left;"> Asian and Pacific Islander </td> </tr> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> Black/African American </td> <td style="text-align:left;"> Black or African American </td> <td style="text-align:left;"> Black or African American </td> </tr> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> Native American/Alaska Indigenous Resident </td> <td style="text-align:left;"> American Indian/Alaska Native </td> <td style="text-align:left;"> American Indian and Alaska Native </td> </tr> <tr> <td style="text-align:left;"> Race </td> <td style="text-align:left;"> White </td> <td style="text-align:left;"> White </td> <td style="text-align:left;"> White </td> </tr> <tr> <td style="text-align:left;"> Ethnicity </td> <td style="text-align:left;"> Hispanic/Latino </td> <td style="text-align:left;"> Hispanic </td> <td style="text-align:left;"> Hispanic or Latino </td> </tr> </tbody> </table> --- ## NVDRS State Data - 27 States (43%) <style> .custom h2, .custom p { margin-bottom: 0; } </style>
--- ## Exploratory Data Analysis <img src="data:image/png;base64,#./figures/Homicide_Race_Age.png" width="100%" style="display: block; margin: auto;" /> .caption[Homicide Mortality Rates Per 100,000 From 2016 To 2020 By Age Group And Race ] --- ## Motivation to Go Beyond Exploratory Data Analysisn (EDA) .pull-left[ - **Benefits of Statistical Modeling:** .small[ - Offers rigorous & sophisticated analysis - Controls for other variables - Isolates effects of specific factors of interest - Handles confounding variables, e.g., age, sex, race - Enables accurate conclusions ] - **Limitations of EDA Alone:** .small[ - Provides initial data understanding - Identifies patterns - But lacks control for confounding factors - Cannot provide conclusive evidence for complex relationships ] ] .pull-right[ - **Capabilities of Statistical Modeling:** .small[ - Makes predictions based on the model - Tests hypotheses - Examines relationships, e.g., mortality rates vs. types of violent deaths - Informs policies & interventions to reduce violent deaths ] - **Conclusion:** .small[ - EDA is foundational but not conclusive - Statistical modeling is crucial for reliable research & policy decisions ] ] --- ## Methodology - Negative Binomial GLM - `\(Y_i, i=1, 2, \ldots, N\)`: random variables for the number of deaths due to a violent event, with realizations denoted as `\(y_i, i=1, 2, \ldots, N\)`. - Assume `\(Y_i|\mu_i, r_i \sim NegBin(\mu_i, r_i), i=1, 2, \ldots, N\)` - `\(X_i'=(X_{1i}, X_{2i}, \ldots X_{pi})\)`: a `\(p\)`-dimensional vector of categorical factors with its realization `\(x_i'= (X_{1i}, x_{2i}, \ldots x_{pi})\)`, `\(i=1, 2, \ldots, N\)`. When modeling mortality rates, the Negative-Binomial GLM can be related to the linear model for the ratio response as follows: .center[ `\(\log(Y_i) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \cdots + \beta_p x_{pi} + \mbox{offset}(\log(p_i/100000))\)` ] where `\(p_i\)` represents the population count associated with the number of deaths `\(Y_i\)` for the `\(i\)`th level of aggregation. --- ## Statistical Analysis - GLM Modeling .pull-left-2[ .small[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> MODEL NAME (First 2 Characters) </th> <th style="text-align:center;"> FACTORS </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;width: 4cm; "> M1 </td> <td style="text-align:center;"> Year + State + Sex + Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M2 </td> <td style="text-align:center;"> State + Sex + Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M3 </td> <td style="text-align:center;"> Sex + Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M4 </td> <td style="text-align:center;"> Race (or Ethnicity) + Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M5 </td> <td style="text-align:center;"> Sex + Race (or Ethnicity) </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M6 </td> <td style="text-align:center;"> Race (or Ethnicity) </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M7 </td> <td style="text-align:center;"> Age Group </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M8 </td> <td style="text-align:center;"> Race (or Ethnicity) + Age Group + Age Group:Race (or Ethnicity) </td> </tr> <tr> <td style="text-align:left;width: 4cm; "> M9 </td> <td style="text-align:center;"> Age Group + Sex </td> </tr> </tbody> </table> ] ] .pull-right-2[ <img src="data:image/png;base64,#./figures/models.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Analysis Of The Glm Models - `Step A:` The chi-square goodness-of-fit test (GOF) is applied to all GLM models and those that passed these test are further considered. - `Step B:` Model diagnostics are examined for all models. The diagnostic tools include the half-normal plot of residuals and the density plot. Diagnostic plots are analyzed along with the findings in step A. - `Step C:` All models are additionally tested for overdispersion using the R function check_overdispersion() from the R package performance (0.10.3) (Lüdecke et al. 2021), and those models that did not pass the test are disregarded if possible. - `Step D:` The likelihood-ratio test and Akaike information criterion (see Appendix C) are used to compare the pairs of models in the selected subsets of model space. - `Step E:` The investigation and selection of the most suitable model are based on balancing consistency and the results across all the above steps. - `Step F:` The most suitable selected GLM model is used to explain the results and findings of this study. --- ## Models Chosen for Suicide, Homicide, and Other <table> <thead> <tr> <th style="text-align:left;"> Model </th> <th style="text-align:center;"> Factors </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> M6-S-R </td> <td style="text-align:center;"> Race </td> </tr> <tr> <td style="text-align:left;"> M8-S-R </td> <td style="text-align:center;"> Age Group + Race + Age Group:Race </td> </tr> <tr> <td style="text-align:left;"> M5-S-E </td> <td style="text-align:center;"> Sex + Ethnicity </td> </tr> <tr> <td style="text-align:left;"> M2-H-R </td> <td style="text-align:center;"> State + Age Group + Sex + Race </td> </tr> <tr> <td style="text-align:left;"> M2-H-E </td> <td style="text-align:center;"> State + Age Group + Sex + Ethnicity </td> </tr> <tr> <td style="text-align:left;"> M2-O-R </td> <td style="text-align:center;"> State + Age Group + Sex + Race + Manner </td> </tr> <tr> <td style="text-align:left;"> M2-O-E </td> <td style="text-align:center;"> State + Age Group + Sex + Ethnicity + Manner </td> </tr> </tbody> </table> --- ## Major Findings - 1 **Suicide:** .small[ - **Race:** - Native Americans have the highest suicide risk across all age groups - Native Americans are over 2x more likely to die by suicide than whites - Blacks/African Americans and Asians/Asian Americans generally have a lower suicide risk than whites - Asians/Asian Americans' suicide risk is ~25% less than whites - **Ethnicity:** - Hispanic/Latino suicide risk is ~30% lower than non-Hispanic/Latinos - **Sex:** - Males have a 3.5x higher suicide risk than females - **Age:** - Suicide risk peaks at ages 45–54 and declines in older age groups - Ages 45–54 have an almost 8x higher risk than ages 10–14 ] --- ## Major Findings - 2 **Homicide:** .small[ - **Race:** - Blacks/African Americans have a 7.5x higher homicide risk than whites - Asians/Asian Americans and Native Americans have a 3x and 4x higher risk, respectively, than whites - **Ethnicity:** - Hispanic/Latinos have a 1.2x higher homicide risk than non-Hispanic/Latinos - **Sex:** - Males have a >3x higher homicide risk than females - **Age:** - Peak homicide risk is for ages 25–29 ] --- ## Major Findings - 3 **Other Manners of Violent Death:** .small[ - **Race:** - Native Americans have the highest risk, 5x more likely than whites. - Blacks/African Americans have a 2.3x higher risk than whites - Asians/Asian Americans have a 1.6x higher risk than whites - **Ethnicity:** - Hispanic/Latinos have a 1.4x higher risk than non-Hispanic/Latinos - **Sex:** - Males have a 1.5x higher risk than females - **Age:** - Peak risk is for ages 30–34, which is over 2x higher than ages 10–14 ] --- ## Relative Risk By Race Or Ethnicity .pull-left[ <img src="data:image/png;base64,#./figures/Risk_Race.png" width="100%" style="display: block; margin: auto;" /> .caption[No other fixed factors are used for suicide; fixed factors for homicide are state, age, and sex; fixed factors for “other" are state, age, sex, and manner.] ] .pull-right[ <img src="data:image/png;base64,#./figures/Risk_Ethnicity.png" width="100%" style="display: block; margin: auto;" /> .caption[Fixed factor for suicide is sex; fixed factors for homicide are state, age, and sex; fixed factors for “other” are state, age, sex, and manner.] ] --- ## Relative Risks By Ethnicity and Sex in Model M5-S-E <img src="data:image/png;base64,#./figures/Risk_M5SE.png" width="90%" style="display: block; margin: auto;" /> --- ## Limitations of the Study .small[ .pull-left[ **Data Scope & Completeness:** - NVDRS data from only 27 states (2016-2020) - TX, CA, FL, NY excluded due to lack of data - Data quality varies across states & jurisdictions **Factors Examined:** - Limited to sex, age group, race, and ethnicity - Missing: drug use, mental health, firearm access, social status, poverty rate, unemployment, etc **Data Sources:** - Uncertainty with population data from WONDER - Missing raw data access - Small population samples might skew mortality rates ] .pull-right[ **Modeling Approach:** - Model uncertainty - Potential model selection bias - Models useful for relationships but not predictions **Modeling Biases Addressed:** - .green[Overdispersion]: Handled with negative binomial models and goodness-of-fit tests - .green[Temporal Dependence]: Year found insignificant - .green[Spatial Dependence]: Addressed for some death types with state as a factor - .green[Measurement Error]: Exists but not addressed in this study ] ] --- ## Discussion & Conclusion - I **Study Overview:** - GLM models built for violent death mortality rates (2016-2020) in 27 U.S. states - Factors: sex, race, ethnicity, age --- ## Discussion & Conclusion - II **Key Findings:** .small[ - Mortality rates for violent deaths vary between underrepresented vs. white communities - Significant determinants: sex, age, race, and ethnicity - "Other" manners of death significant subcategories: - Legal intervention - Unintentional firearm (self-inflicted) - Undetermined intent - Other unintentional firearm deaths (inflicted by another or unknown) - Ethnicity influence: Unintentional firearm deaths not significant for "other" deaths - State-specific findings: - New Mexico & Vermont: Low significance for homicide deaths with race in the model - Maryland, New Mexico, South Carolina, Vermont: Low significance for homicide deaths with ethnicity in the model ] --- ## Discussion & Conclusion - III **Contribution to Literature:** .small[ - Provides a unique GLM modeling approach for NVDRS data - Addresses a gap in mortality studies of violent deaths in the U.S. ] **Limitations:** .small[ - Care required when generalizing results beyond the sample states] **Recommendations for Future Research:** .small[ - Study violent deaths in the remaining states with complete NVDRS data - Investigate COVID-19 pandemic's impact on violent deaths - Focus on all manners of death with emphasis on race, ethnicity, and socioeconomic factors like income, education, and occupation ] --- ## Thanks .pull-left[ - Please do not hesitate to contact me (Tessa Chen) at <a href="mailto:ychen@udayton.edu"><i class="fa fa-paper-plane fa-fw"></i> ychen4@udayton.edu</a>. - Slides were created via the R package **xaringan**, with styling based on: * [xariganthemer](https://cran.r-project.org/web/packages/xaringanthemer/vignettes/xaringanthemer.html) package, and * Alison Hill's [@apreshill](https://github.com/apreshill/) CSS resources for customizing themes and fonts - The formatting of slides is provided by Dr. Fadel M. Megahed [@fmegahed](https://github.com/fmegahed). ] .pull-right[ <img src="data:image/png;base64,#./figures/Tessa_grey_G.gif" width="60%" style="display: block; margin: auto;" /> ] --- ## References Wilson, R. F., G. Liu, B. H. Lyons, et al. (2022). "Surveillance for violent deaths—National violent death reporting system, 42 States, the District of Columbia, and Puerto Rico, 2019". In: _MMWR Surveillance Summaries_ 71.6, p. 1. --- # Appendix - Suicide Models ## Coefficients: Suicide-Race, Model 6 (M6-S-R) <table> <thead> <tr> <th style="text-align:left;"> Variable </th> <th style="text-align:center;"> Relative Risk </th> <th style="text-align:center;"> Coefficients Esitmate </th> <th style="text-align:center;"> Coefficients Standard Error </th> <th style="text-align:center;"> Z Value </th> <th style="text-align:center;"> P Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Intercept </td> <td style="text-align:center;"> - </td> <td style="text-align:center;"> 2.95 </td> <td style="text-align:center;"> 0.01 </td> <td style="text-align:center;"> 197.77 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Race R2 </td> <td style="text-align:center;"> 63% </td> <td style="text-align:center;"> -0.46 </td> <td style="text-align:center;"> 0.03 </td> <td style="text-align:center;"> -16.55 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Race R3 </td> <td style="text-align:center;"> 76% </td> <td style="text-align:center;"> -0.27 </td> <td style="text-align:center;"> 0.03 </td> <td style="text-align:center;"> -7.77 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Race R4 </td> <td style="text-align:center;"> 212% </td> <td style="text-align:center;"> 0.75 </td> <td style="text-align:center;"> 0.04 </td> <td style="text-align:center;"> 18.24 </td> <td style="text-align:center;"> <0.001 </td> </tr> </tbody> </table> .right[.caption[Baseline for race is white (R1).]] <table> <thead> <tr> <th style="text-align:center;"> DF </th> <th style="text-align:center;"> DEVIATION </th> <th style="text-align:center;"> DISPERSION </th> <th style="text-align:center;"> AIC </th> <th style="text-align:center;"> P VALUE </th> <th style="text-align:center;"> 2-LIKELIHOOD </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 5976 </td> <td style="text-align:center;"> 5931.16 </td> <td style="text-align:center;"> 1.87 </td> <td style="text-align:center;"> 37091.11 </td> <td style="text-align:center;"> 0.66 </td> <td style="text-align:center;"> -37081.1 </td> </tr> </tbody> </table> --- ## Diagnostic Plots for Suicide-Race Model 6 (M6-S-R) <br></br> <img src="data:image/png;base64,#./figures/suicide_race_M6.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Suicide-Ethnicity, Model 5 (M5-S-E) <table> <thead> <tr> <th style="text-align:left;"> Variable </th> <th style="text-align:center;"> Relative Risk </th> <th style="text-align:center;"> Coefficients Esitmate </th> <th style="text-align:center;"> Coefficients Standard Error </th> <th style="text-align:center;"> Z Value </th> <th style="text-align:center;"> P Value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Intercept </td> <td style="text-align:center;"> - </td> <td style="text-align:center;"> 2.13 </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> 139.85 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Ethnicity E2 </td> <td style="text-align:center;"> 72% </td> <td style="text-align:center;"> -0.33 </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> -14.00 </td> <td style="text-align:center;"> <0.001 </td> </tr> <tr> <td style="text-align:left;"> Sex Male </td> <td style="text-align:center;"> 353% </td> <td style="text-align:center;"> 1.26 </td> <td style="text-align:center;"> 0.02 </td> <td style="text-align:center;"> 63.94 </td> <td style="text-align:center;"> <0.001 </td> </tr> </tbody> </table> .right[.caption[Baseline for ethnicity is non-Hispanic/Latino (E1), for sex is female (F).]] <table> <thead> <tr> <th style="text-align:center;"> DF </th> <th style="text-align:center;"> DEVIATION </th> <th style="text-align:center;"> DISPERSION </th> <th style="text-align:center;"> AIC </th> <th style="text-align:center;"> P VALUE </th> <th style="text-align:center;"> 2-LIKELIHOOD </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 3607 </td> <td style="text-align:center;"> 3690.25 </td> <td style="text-align:center;"> 4.31 </td> <td style="text-align:center;"> 25523.71 </td> <td style="text-align:center;"> 0.16 </td> <td style="text-align:center;"> -25515.71 </td> </tr> </tbody> </table> --- ## Diagnostic Plots for Suicide-Ethnicity Model 5 (M5-S-E) <br></br> <img src="data:image/png;base64,#./figures/suicide_ethnicity_M5.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Suicide-Race, Model 8 (M8-S-R) <img src="data:image/png;base64,#./figures/suicide_race_M8_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Suicide-Race Model 8 (M8-S-R) <br></br> <img src="data:image/png;base64,#./figures/suicide_race_M8.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Homicide-Race, Model 2 (M2-H-R) <img src="data:image/png;base64,#./figures/homicide_race_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Homicide-Race Model 2 (M2-H-R) <br></br> <img src="data:image/png;base64,#./figures/homicide_race_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Homicide-Ethnicity, Model 2 (M2-H-E) <img src="data:image/png;base64,#./figures/homicide_ethnicity_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Homicide-Ethnicity Model 2 (M2-H-E) <br></br> <img src="data:image/png;base64,#./figures/homicide_ethnicity_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Other-Race, Model 2 (M2-O-R) <img src="data:image/png;base64,#./figures/other_race_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Other-Race Model 2 (M2-O-R) <br></br> <img src="data:image/png;base64,#./figures/other_race_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]] --- ## Coefficients: Other-Ethnicity, Model 2 (M2-O-E) <img src="data:image/png;base64,#./figures/other_ethnicity_M2_coe.png" width="100%" style="display: block; margin: auto;" /> --- ## Diagnostic Plots for Other-Ethnicity Model 2 (M2-O-E) <br></br> <img src="data:image/png;base64,#./figures/other_ethnicity_M2.png" width="100%" style="display: block; margin: auto;" /> .right[.caption[Blue line: empirical density from the data; red line: density from the fitted model]]